Search CORE

42,467 research outputs found

Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

Author: Das S
Dawson NL
Dessailly BH
Lee D
Lees JG
Orengo CA
Rentzsch R
Sillitoe I
Studer RA
Yeats C
Publication venue
Publication date: 21/11/2013
Field of study

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year

UCL Discovery

PubMed Central

The Gene3D Web Services: a platform for identifying, annotating and comparing structural domains in protein sequences

Author: C. Orengo
C. Yeats
Delorenzi
I. Sillitoe
J. Lees
Krogh
Meszaros
P. Carter
Ranea
Velankar
Publication venue: Oxford University Press
Publication date: 07/06/2011
Field of study

The Gene3D structural domain database provides domain annotations for 7 million proteins, based on the manually curated structural domain superfamilies in CATH. These annotations are integrated with functional, genomic and molecular information from external resources, such as GO, EC, UniProt and the NCBI Taxonomy database. We have constructed a set of web services that provide programmatic access to this integrated database, as well as the Gene3D domain recognition tool (Gene3DScan) and protein sequence annotation pipeline for analysing novel protein sequences. Example queries include retrieving all curated GO terms for a domain superfamily or all the multi-domain architectures for the human genome. The services can be accessed using simple HTTP calls and are able to return results in a range of formats for quick downloading and easy parsing, graphical rendering and data storage. Hence, they provide a simple, but flexible means of integrating domain annotations and associated data sets into locally run pipelines and analysis software. The services can be found at http://gene3d.biochem.ucl.ac.uk/WebServices/

Crossref

PubMed Central

UCL Discovery

Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists

Author: Bellistri Elisa
Franceschini Andrea
Masseroli Marco
Pinciroli Francesco
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within <it>GFINDer</it>, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining. Results Exploiting protein information in Pfam and InterPro databanks, we developed and added in <it>GFINDer </it>original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the <it>Statistics Protein Families&Domains </it>module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the <it>Logistic Regression </it>module allows identifying protein functional signatures that better explain the considered gene classification. Conclusion Novel <it>GFINDer </it>modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes.</p

Archivio istituzionale della ricerca - Politecnico di Milano

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Domain-mediated interactions for protein subfamily identification

Author: Han S.K.
Kim D.
Kim I.
KIM SANGUK
Kong J.
Lee H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.11Ysciescopu

포항공과대학교

Biases in the Experimental Annotations of Protein Function and their Effect on Our Understanding of Protein Function Space

Author: Babbitt Patricia C.
Friedberg Iddo
Ream David C.
Schnoes Alexandra M.
Thorman Alexander W.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/04/2013
Field of study

The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here we investigate just how prevalent is the "few articles -- many proteins" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.Comment: Accepted to PLoS Computational Biology. Press embargo applies. v4: text corrected for style and supplementary material inserte

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

InterPro in 2017-beyond protein family and domain annotations

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences

PubMed Central

eScholarship - University of California

The University of Manchester - Institutional Repository

Explore Bristol Research

Archivio istituzionale della ricerca - Università di Padova

InterPro in 2017-beyond protein family and domain annotations

Author: Attwood TK
Babbitt PC
Bateman A
Bork P
Bridge AJ
Chang HY
Dosztányi Z
El-Gebali S
Finn RD
Fraser M
Gough J
Haft D
Holliday GL
Huang H
Huang X
Letunic I
Lopez R
Lu S
Marchler-Bauer A
Mi H
Mistry J
Mitchell AL
Natale DA
Necci M
Nuka G
Orengo CA
Park Y
Pesseat S
Piovesan D
Potter SC
Rawlings ND
Redaschi N
Richardson L
Rivoire C
Sangrador-Vegas A
Sigrist C
Sillitoe I
Smithers B
Squizzato S
Sutton G
Thanki N
Thomas PD
Tosatto SC
Wu CH
Xenarios I
Yeh LS
Young SY
Publication venue
Publication date: 29/11/2016
Field of study

UCL Discovery

Toll-like receptor signaling in vertebrates: Testing the integration of protein, complex, and pathway data in the Protein Ontology framework

Author: Arighi Cecilia
D’Eustachio Peter
Masci Anna Maria
Natale Darren
Ruttenberg Alan
Shamovsky Veronica
Smith Barry
Wu Cathy
Publication venue
Publication date: 01/01/2015
Field of study

The Protein Ontology (PRO) provides terms for and supports annotation of species-specific protein complexes in an ontology framework that relates them both to their components and to species-independent families of complexes. Comprehensive curation of experimentally known forms and annotations thereof is expected to expose discrepancies, differences, and gaps in our knowledge. We have annotated the early events of innate immune signaling mediated by Toll-Like Receptor 3 and 4 complexes in human, mouse, and chicken. The resulting ontology and annotation data set has allowed us to identify species-specific gaps in experimental data and possible functional differences between species, and to employ inferred structural and functional relationships to suggest plausible resolutions of these discrepancies and gaps

PhilPapers

Directory of Open Access Journals

PubMed Central

University of Delaware Library Institutional Repository

SIFTER search: a web server for accurate phylogeny-based protein function prediction.

Author: Brenner Steven E
Luo Kevin R
Sahraeian Sayed M
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded

CiteSeerX

PubMed Central

eScholarship - University of California